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Abstract 

School classroom evaluation methods using student achievement results are currently a 
significant topic of investigation in the educational accountability arena. The objective of 
this study was to identify effective and ineffective elementary school classrooms based 
on student and teacher characteristics. In this conceptualization, teacher’s effectiveness 
in reading and math was associated with exceptional measured performance above or 
below that would be expected from the students across the district. The findings of the 
multiple regressions indicate that previous test score was the strongest predictor of 
student achievement. Student characteristics and teacher characteristics also significantly 
contributed to the explained variance of the regression model, yet not at the same 
magnitude as previous test scores. Future research efforts include the study of best 
practices of high performing teachers identified by the findings of the residual analysis. 

Keywords: Classroom Accountability, Value-Added Methodology, Teacher Effectiveness 
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Classroom Accountability: A Value-Added Methodology 

School classroom evaluation methods using student achievement results are currently a 
significant topic of investigation in the educational accountability arena (Millman, 1997). 
Historically, effective school research has been a forerunner in taking a value-added approach to 
school improvement. After identifying low-income, high-performing schools, these studies 
attempted to identify the characteristics of schools that make them instructional effective for 
disadvantage students (Brookover, Beady, Flook, Schweitzer, & Wisenbaker, 1979; Clark, Lotto, 
& McCarthy, 1980; Edmonds, 1979; Purkey & Smith, 1983; Rutter, Maughan, Mortimore, 
Outson, & Smith, 1979). 

The effective school research was typically formulated as a two-stage process. The first 
stage identified schools that are particularly effective for low socioeconomic status children. In 
the second stage, researchers searched for characteristics that were common among the school 
identified as effective. Edmonds (1979) concluded that all schools could be effective should they 
adhere to the multiple components for success; this author and leader of the effective school 
research reinforced the idea that leadership, expectations, atmosphere and instructional emphasis 
are consistently essential institutional determinants of pupil performance. 

Since the “effective schools” movement of the late 70’ s, there has been an emphasis on 
the importance of teachers and school staff on the improvement in student achievement. The risk, 
however, has been to ignore the student’s background and other contextual factors that also 
affect student performance (Gibson & Asthana, 1998); this leads to the belief that schools can 
effect achievement largely independently of contextual constraints. According to the authors, 
until the contextual factors are better understood, the spiral of disadvantage will continue. In 
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recent years, there has been an interest in the role that contextual factors have on the quality and 
performance of schools and their pupils (Stronge & Tucker, 2000; Webster, 1994). 

Effective schools are usually distinguished from ineffective schools based on whether 
students learn what is reportedly being taught. According to Sanders and Horn (1995a), rarely is 
previous achievement data used in determining whether a student is learning. According to the 
authors, the main reason is associated with the difficulty in separating school effects on learning 
from demographic and previous test scores effects. Currently, the media and general public judge 
school effectiveness, in most cases, on their overall student performance on standardized tests. 
This is a very simplistic comparison that leads to public labeling and stigmatization of the low- 
performing schools. 

Another line of inquiry that was developed simultaneously with the effective school 
research was the teacher effectiveness research. Since the 1970s, Jere Brophy and colleagues 
have conducted seminal work in this area of research. Teacher effectiveness research is 
associated with studies linking teacher behavior to student outcomes; it is related to studies of 
teachers in the classroom to discover effective practices (Brophy, 1988). According to this author 
(1979), for example, direct instruction in small groups coupled with formal organization and 
management are as effective in producing satisfactory learning results at the secondary level as 
they are in teaching basic skills at the elementary level. Good and colleagues (1983) found that 
the amount of learning by pupils is related to exposure to content, teachers maximizing pupil 
learning allocate more classroom time to academic activities, greater learning is associated with 
frequent presentation of materials and practice and application of what is learned, and teacher 
beliefs about students correlates with student achievement. 
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Value-Added Methodological Approaches to Teacher Effectiveness 

Determining effective teaching has been a problem for educational researchers. New 
approaches have been developed in the last two decades, especially in developments in using 
student achievement data. The use of student assessment data in the evaluation of teachers has 
become a major theme in the educational research community (Millman, 1997). Educational 
outcome indicators are usually used to measure the performance of school, programs, policies, 
and teachers. Reliance on such indicators is largely the result of the accountability era in 
education: a growing demand to hold schools accountable for their performance, defined in terms 
of outcomes, such as standardized test scores in reading, math, social studies and science, rather 
than inputs, such as teacher qualifications, class size, or the quality of lab facilities (Meyer, 

2000). 

Meyer (1996, 2000) has studied the difference and significance of value-added indicators 
of school performance. Multiple weaknesses accompany the most commonly used educational 
outcome indicators such as the average and median test scores, proficiency level indicators (i.e., 
measure the proportion of students who score above a specified proficiency level cut point), and 
gain indicator (i.e., the change in average test scores from grade to grade for the same cohort of 
students). Overall, these typical performance indicators tend to be biased against schools and 
programs that disproportionately serve at-risk students with high mobility. In this instance, the 
main source of bias is the well-known fact that school productivity is only one of the many 
determinants of student achievement. In fact, the differences in prior achievement, student and 
family characteristics (control variables) account for far more of the variation in student 
achievement than school-related factors (Meyers, 2000; Munoz & Dossett, 2001). A value-added 



Classroom Accountability: A Value-Added Methodology 6 



methodology is one that statistically adjusts the outcome variables by the important inputs that 
relate to these outcomes, but that are not under the control of schools. 

The essence of the value-added approach is that school and program performance is 
measured using a statistical regression model that includes, to the extent possible, all of 
the nonschool factors that contribute to growth in student achievement, in particular, prior 
student achievement and student, family, and neighborhood characteristics. The key idea 
is to statistically isolate the contribution of schools and programs to growth in student 
achievement at a given grade level from all other sources of student achievement growth. 
(Meyers, 2000, p. 2) 

In general, according to Meyers (2000), the quality of a value-added indicator is 
determined by four factors: (a) the frequency with which students are tested, (b) the quality and 
appropriateness of the tests that underlie the indicators, (c) the adequacy of the control variables 
included in the value-added models, and (d) the appropriateness (validity) of the statistical model 
used to define the indicator. Ongoing research is needed to assess the sensitivity of estimates of 
school performance to alternative statistical models and alternative sets of control variables. 

Prime examples in this arena of value-added research systems are (a) the Oregon teacher 
work sample methodology (Airasian, 1997); (b) the Minneapolis value-added system (Du & 
Heistad, 1999); the Dallas value-added accountability system (Webster & Mendro, 1997); and, 

(d) the Tennessee Value-Added Assessment System (Sanders & Horn, 1994, 1998). Teacher 
evaluation and student achievement are becoming two intertwined concepts. In the next section a 
brief overview of the aforementioned systems will be presented. 
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Oregon Teacher Work Sample Methodology 

The Oregon Teacher Work Sample Methodology (TWSM) is a method intended to link 
student learning gains to teacher performance (Airasian, 1997). The eight steps methodology is 
structured and leads the teachers to think through and link objectives, teaching methods, 
resources, pupil needs, and pupil assessment in a logical manner. It is designed to foster both 
formative and summative teacher self-evaluation. The main characteristic is that it focuses 
teachers on pupil learning as the fundamental purpose of good teaching. The Index of Pupil 
Growth (IPG) is used to determine the percentage of potential growth evidenced by pupils from 
pre- to post-testing. 

The IPG is essentially a gain score metric. Although initial work has been started, it is 
clear that raw gain scores do not provide a direct indication of the unique contribution a teacher 
makes to the gains. Other factors such as pupils’ prior knowledge, socioeconomic status, student 
language proficiency, classroom resources and the like also can influence student learning gains. 
To determine the aforementioned elements, it requires the extraction from gain scores of factors 
that relate to pupil learning, but mask a teacher’s unique contribution. 

From a measurement perspective, face and content validity do not appear to be a 
problem. According to Airasian (1997), however, there are a number of concerns associated with 
the tests and scales used in TWSM. For example, the quality of the pre- and posttests constructed 
by teachers to assess their pupils learning. Concerns include the quality of the test items, the 
levels of pupil learning being assessed, the variability in difficulty across tests, the number of 
items per test, the format of the items, and the comparability of the pre- and posttests. Nor it is 
clear to what extent teachers select easy-to-meet objectives or teach narrowly to the specific 
posttest items. 
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Minneapolis Public School 

Minneapolis Public School (MPS) uses the value-added approach to identify effective 
teachers to replicate their success, rather than to identify teachers that are categorized as 
ineffective. MPS rates schools, students, and teachers along a performance continuum by using a 
statistical model that incorporates core indicators such as (a) student achievement gain compared 
to expected gain, (b) narrowing of gaps in achievement between different groups of students in 
state defined standards, (c) learning climate, (d) safety, (e) student satisfaction and involvement, 
and, (f) family satisfaction and involvement (Du & Heistad, 1999). The district uses these results 
to provide schools feedback, which can range from monetary bonuses to intensive support, 
including the reconstitution of the school staff. 

The school performance data is determined from student achievement, attendance and 
suspensions data as well as teacher, student, and parent surveys. MPS use the value-added 
statistical methodology to determine both school and teacher effectiveness. In order to estimate 
the value-added effects of the schools and teachers, the MPS use two statistical models. First, an 
ANCOVA model is used to control for differences in student characteristics and initial 
achievement level. Second, a two-stage hierarchical linear model is used for controlling school 
level variables (Du & Heistad, 1999). 

The MPS model shows that, at the student level, (a) prior reading score, (b) socio- 
economic status (as measured by free and reduced lunch eligibility), (c) gender, (d) limited 
English proficiency, (e) special education background, and (f) minority background significantly 
affected student achievement (Du & Heistad, 1999). More than two thirds of the variance is 
explained by the background factors. A value-added index is calculated for all schools. Schools 
performing better than one standard deviation are considered effective, while schools performing 
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below one standard deviation are considered ineffective. The same models were applied to 
determine teacher effectiveness. 

Using the value-added approach and controlling for student characteristics such as 
poverty, gender, race, special learning needs and prior achievement, we can distinguish 
teacher’s instructional effects and school effectiveness from external factors outside the 
influence of the teacher and the school (Du & Heistad, 1999, p. 20). 

The Dallas Value-Added Accountability System 

In response to racial court litigations and requirements, the Dallas Independent School 
District (DISD) developed a model for accountability. The use of assessment data was viewed as 
the most effective piece in developing such a model. Commitment to participate from all 
stakeholders was also strongly emphasized in the model. The model integrated state 
accountability requirements within everyday assessment efforts at the local jurisdiction (school 
district) and, at the same time, incorporated managerial responses to effectiveness and 
ineffectiveness data at the teacher and school level (Cunningham, 1997). The school 
effectiveness methodology, as implemented in the DISD, defines a school's effectiveness as 
being associated with exceptional measured performance above or below that which would be 
expected across the entire District. That is the case when a school population of students departs 
markedly from its own pre-established trend. 

For several years during the mid 1980s, DISD used a multiple regression to rank schools 
for effectiveness. This method was used to determine if schools exceeded their predicted growth 
(Webster & Mendro, 1997). In the early 1990s, the DISD Board of Education established a 
Commission for Educational Excellence, which recommended the development of an 
accountability system that was based on variables in addition to test scores (Webster & Mendro, 
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1997). The task force, composed of teachers, principals, parents, members of the business 
community, and central office administrators, required that the new methodology would meet the 
following criteria: (a) it must be value-added; (b) it must include multiple outcome variables; (c) 
schools must only be held accountable for students who have been continuously enrolled and 
exposed to their instructional program; (d) it must be based on cohorts of students, not on cross- 
sectional data; and, (e) it must be fair. In terms of fairness, schools must derive no particular 
advantage by starting with high scoring, non-minority, high socioeconomic status, or non-limited 
English proficient students. This system is believed to be fair and equitable. In addition, factors 
over which the schools have no control, like student mobility, must be taken into consideration 
(Webster, Mendro, & Almaguer, 1994; Webster, Mendro, Orsak, & Weerasinghe, 1998). 

During last decade, DISD has used a value-added accountability system to determine 
effective schools. This system has recently been expanded to assist in identifying effective 
teachers and to shape teacher evaluations for the district. The new and current system combines 
using multiple regression and hierarchical linear modeling. In the first stage, a multiple 
regression model is used regressing outcome and predictor variables against covariates called 
“fairness variables”. Student test scores are regressed against nine student level characteristics or 
covariates; ethnicity, limited English proficiency status, gender, and socioeconomic variables. In 
the second stage, the methodology developed produces school effectiveness estimates is a two- 
level student-school or student-teacher HLM model that uses the residuals resulting from the 
multiple regression (Webster & Mendro, 1997; Webster, Mendro, Orsak, & Weerasinghe, 1998; 
Weerasinghe, Anderson, & Bembry, 2001). The school level outcome is called the “School 
Effectiveness Indices” (SEI) and the teacher level outcome is called the “Classroom 
Effectiveness Indices” (CEI). 
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According to Webster and Mendro (1997), the Dallas accountability model, does the 
following: (a) controls for “fairness variables” or preexisting student differences in ethnicity, 
gender, language proficiency, and socioeconomic status; (b) includes a criterion- and norm- 
referenced test scores, student attendance rates, dropout rates, student retention rates, student 
enrollment in honors courses and advanced diploma plans, graduation rates, and percentage of 
students taking college entrance tests; (c) weights test scores more heavily than other variables 
based on the determination by the Accountability Task Force; (d) includes holding schools 
accountable for only those students that have been there long enough for the school to have 
impacted their education; and, (e) includes testing at least 95% of the eligible students so that 
schools would not withhold students from testing. 

Tennessee Value-Added Assessment System 

The Tennessee Value-Added Assessment System (TVAAS) is a numerical, multi-level 
model developed with the intention to give unbiased estimates of the effect of school systems, 
schools, and teachers on the academic gains of students (Bratton, Horn, & Wright, 2001; 
Sanders, Saxton, Schneider, Dearden, Wright, & Horn, 1994). This model relies on the scaled 
scores derived from the norm-referenced part of the Tennessee Comprehensive Assessment 
Program (TCAP) test, which is taken by all students in Tennessee from grades 2 through 8. 

In the TVAAS multi-level model, the students are serving as their own control for 
extraneous factors, such as socio-economic status and student body composition (Sanders & 
Horn, 1995b). The goal is not to compare students, but to measure the progress each student 
makes in a school year. The assessment system determines an effective school as a school that 
provides educational opportunities for all students regardless as to whether the student is an 
advanced or slower learner. The authors argue that the TVAAS was developed on the premise 
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that society has a right to expect that schools will provide students with the opportunity for 
academic growth regardless of the level at which the students enter the educational venue. In 
other words, all students can and should learn commensurate with their abilities. 

Sanders and Rivers (1996) completed a TVAAS study with two large school systems on 
the cumulative effects of teachers on student achievement using three years of data. First, they 
developed an estimate of teacher effect on a current test scores controlling for previous 
achievement. According to their research, an effective teacher can facilitate excellent academic 
gains after a relatively ineffective teacher, but the residual of effects of the ineffective teacher 
can be measured in subsequent student achievement scores. Findings indicated that the two lower 
quintiles did not facilitate gains for most of their students. In addition, ethnic group differences 
were observed within each quintile. The findings showed similar gains across ethnic groups 
within each teacher quintile, but that assignment to the lower quintile was slightly 
disproportionate with more black students. 

More recently, Sanders and Rivers (2001) completed an evaluation of a group of 
elementary schools located in the 29'*’ largest school system in the nation. Three CTBS scale test 
scores (i.e., reading, language arts, and mathematics) were used as predictors. The number of 
cases required per classroom was ten students. The decision criterion for identifying above, on 
average, or below teachers was to double the standard error on a variable called teacher effect. 

Of the fifty-nine classrooms in the study, the TVAAS methodology identified eight classrooms 
as effective, eight classrooms as ineffective, and forty-three classrooms as average. 

Comparison of Methodologies 

Studies have been conducted to compare the different methodological approaches. A 
study on teacher effectiveness compared the TVAAS mixed-effect model with the traditional 
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OLS multiple regression method (Rodosky & Munoz, 2002). The regression approach was 
derived from work previously done by Stronge and Tucker (2000). In the regression, analysis of 
standard errors or residuals helped to identify below average, on average, and above average 
teachers. In the TVAAS, a teacher effect variable and standard errors were calculated; the 
teacher effect variable had to be at least two times the standard error to be considered. In terms 
of explanatory power, both methods had approximately 50 percent of explained variance. In 
looking at the effective teachers, both methods identified the same number of effective 
classrooms. Although an adjusted cut score facilitated a better level of agreement between both 
methodologies, discrepancies were observed when identifying ineffective classrooms. 

Multilevel analysis is Just one of many methods for understanding data and it may not 
always work. Kreft and De Leeuw (1998) argued that multilevel models are useful if the data is 
constructed similarly to the multilevel model, if previous research exists that guides the 
explanatory variables and random components, and the user had knowledge of the data. Arnold 
(1992) also lists several limitations to using hierarchical models when measuring teacher 
effectiveness, including that (a) no single methodology should be used when evaluating teacher 
effectiveness, (b) classrooms with few students have low reliability, (c) may not measure 
effective classrooms which are not directly tested on standardized test, (d) limited when 
measuring students that have multiple teachers, (e) limited when measuring students that are 
highly mobile, and, (f) does not take into account the effect certain combinations of teachers may 
have over time. Multi-level models are similar to regression models in that they do not show 
causality, but only how variables can predict other variables. 
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Variables Associated With School and Teacher Effectiveness 

A set of variables that have shown to be important in previous research on school 
performance is prior academic achievement (Chubb & Moe, 1990; Smith & Meier, 1995). In 
many studies of students and schools, performance on achievement tests is highly correlated with 
previous levels of academic achievement. At the district level, for example. Smith and Meier 
(1995) found that school systems doing well in the past continue to perform well. The study 
supported the hypothesis that previous academic performance is a strong predictor of future 
performance. At the individual student level, Chubb and Moe (1990) used initial student scores 
on tests as a predictor of achievement gain scores. In this particular study, sophomore scores 
were subtracted from senior scores in their regression models. The researchers found that student 
ability or initial achievement was the strongest determinant of achievement gains. 

Another set of variables known to be important for determining academic performance is 
family background of students (Roeder, 1999; Munoz & Dossett, 2001). Roeder (1999, 2000) 
studied the performance of schools in relationship to selected academic and social variables. 

After controlling for several school and district factors, the researcher concluded that poverty 
was the strongest determinant of school performance. Similarly, Munoz and Dossett (2001) 
found that socioeconomic status, operationalized as students participation on the national lunch 
program, accounted for the highest percent of explained variance in student achievement across 
four years. Participation in free/reduced lunch explained an average of 58% of the variance 
across all four regression models. 

A final set of factors are teacher-related variables. Although common sense and 
schooling experiences suggest that teachers and teaching make a difference for student 
achievement (both positively and negatively), the available empirical evidence shows mixed 
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findings. Research assessing the performance of school districts in Texas found that 
characteristics of schooling make a difference; in particular, most of these schooling effects were 
due to teacher quality (Ferguson, 1991). The researcher found that the primary schooling effect 
was attributed to teacher performance on a standardized exam, but he also found that two 
additional measures of teacher quality - years of experience and attainment of a masters’ degree 
also contributed to this schooling effect. In contrast, Lee and Fitzgerald (1996) in their 
multivariate model of school district performance in Tennessee found that teacher quality 
(measured by the proportion of professional staff meeting standards) had no significant impact 
on district performance. 

In summary, past research offers ample evidence regarding the role that both teachers and 
students play in accounting for growth in student achievement. Still, the majority of studies have 
focused on traditional input variables for both groups. For teachers, this typically translated into 
years of experience, certification area, education level, while for students, demographic 
characteristics such as socioeconomic status have taken a prominent position. However, more 
recently, value-added methodology has been employed a means by which school and district 
administrators can identify effective classrooms by aggregating pupil gains regardless of 
differences among entering students (Millman, 1997; Stronge & Tucker, 2000). Similarly, by 
aggregating scores by teacher, this methodology can be used to identify which students are 
learning the most and least within each classroom. 

Staffing all classrooms with highly qualified teachers, therefore, is a critical national 
concern. Raising teacher quality has become education reform's top priority. Research affirms 
that teaching quality is the single most important factor influencing student achievement, moving 
students well beyond family backgrounds' limitations (Darling-Hammond, 2000; Kaplan & 
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Owings, 2001; Whitehurst, 2002). The schools students attend and what their teachers know and 
do is a more important influence on student achievement than students' family characteristics and 
ethnicity (Darling-Hammond, 2000; Haycock, 1998; Kaplan & Owings, 2001; Whitehurst, 

2002). Elementary school students who worked with effective teachers for 3 consecutive years 
scored higher than peers of the same starting ability taught by ineffective teachers for 3 
consecutive years by more than 50 points on standardized tests of mathematics skills (Sanders & 
Rivers, 1996) and 35 points in reading (Jordan, Mendro, & Weerasinghe, 1997). Working over 
consecutive years with highly effective teachers produced dramatic gains in student achievement 
for all student groups— low, middle, and high achieving (Haycock, 1998). 

The present study contributes to the existent research examining teacher effectiveness 
using a longitudinal approach. More specifically, the purpose of this study was to explore 
teacher effectiveness in reading and math over a two-year period. Because the state assessment 
system for the school district does not assess the same students in the same content areas for 
consecutive years, it is imperative that a longitudinal approach be taken in determining teacher 
effectiveness. The impetus for this research project was a prior effort that identified successful 
and unsuccessful schools in the school district based on student characteristics. The goal of the 
present study was to explore further than the school level by identifying effective and ineffective 
classrooms. The research questions were (1) what is the impact of student and teacher 
characteristics on third grade teacher effectiveness in reading?; and (2) what is the impact of 
student and teacher characteristics on third grade teacher effectiveness in mathematics? 

In this study, it is expected that classrooms with higher levels of prior academic 
achievement will perform at higher levels on subsequent academic tests. As result, it is 
hypothesized that previous test scores will contribute the highest percent of explained variance 
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on the criterion variable. It is also hypothesized that classrooms with higher proportions of poor 
children will have lower performance scores. Finally, following Ferguson (1991), it is 
hypothesized that classrooms with higher proportions of experienced teachers (measured in 
years) and attainment of a masters’ degree certification will perform at higher levels. 

Method 

Participants 

The analyses were conducted on 416 third grade classrooms (year 1) and 391 third grade 
classrooms (year 2) from 87 elementary schools of Jefferson County Public Schools in 
Louisville, Kentucky. The classrooms consisted of 6592 students (year 1) and 6522 students 
(year 2). There were 276 teachers that were included in both year 1 and year 2 samples. Table 1 
displays the student and teacher characteristics of these participants. 

Table 1 

Student and Teacher Characteristics for Year 1 and Year 2 



Year 1 



Year 2 



M 



Student Characteristics 

Reading Pretest (SDRT) 43.93 

Reading Postest (CTBS) 5 1 .22 

Math Pretest (SDMT) 42.94 

Math Posttest (CTBS) 5 1 .94 

Female .49 

African American .36 

Free/Reduced Lunch .55 

Single Parent Household .5 1 

Teacher Characteristics 

Y ears Experience 11.15 

Masters Degree .72 





M 




20.13 


44.43 


19.89 


20.85 


51.94 


21.04 


20.64 


42.79 


20.91 


20.48 


52.47 


20.53 


.50 


.48 


.50 


.48 


.41 


.49 


.50 


.57 


.50 


.50 


.56 


.50 



9.16 


10.83 


8.88 


.45 


.79 


.41 
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The student criteria for inclusion in the present study were (1) not receiving special 
education services and (2) present 100 days during the 187 day school year. The teacher criteria 
for inclusion were (1) maintained “active” status during the school year and (2) had eight or 
more students in their classroom. 

Instrumentation 

The independent variables included both student characteristics (free/reduced lunch 
status, race, gender, single parent household, reading and math test scores) and teacher 
characteristics (education level and years of experience). Free and reduced price lunch status 
was defined as the percentage of students qualifying for free or reduced lunch according to 
federal guidelines. Free/Reduced lunch status was dummy coded as 1 = free or reduced price 
lunch and 0 = paid). Gender was coded as 1 = females and 0 = males. Race was coded as 1 = 
African American and 0 = other. Single-parent household was the percentage of students whose 
households are not comprised of both their biological mother and father (coded as 1 = single 
parent, 0 = other). The test scores were reported in mean Normal Curve Equivalents (NCE) for 
the Stanford Diagnostic Reading Test (SDRT) and the Stanford Diagnostic Math Test (SDMT). 
The SDRT reading score represents a composite score of the vocabulary, phonetic analysis, and 
comprehension subtests. The SDMT math score represents a composite score of the concepts 
and applications and computation subtests. The diagnostic tests are given to the students at the 
beginning of the fall semester. The individual student scores were aggregated by classroom 
membership. The education level of teacher was dummy coded into 1 = Master’s Degree or 
higher and 0 = Bachelor’s Degree. The teacher’s years of experience represented the number of 
years that the teacher had been teaching for the school district. 
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The fundamental dependent variables were the Comprehensive Test of Basic Skills 
(CTBS/5) Reading and Math test scores. The CTBS is a nationally standardized achievement 
test administered to students at the end of the spring semester and scores are reported in mean 
Normal Curve Equivalents (NCE). The individual student scores were aggregated by classroom 
membership. 

Design and Procedures 

A hierarchical multiple regression approach was conducted at the student and teacher 
level, followed by analyses of residuals (Pedhazur, 1982, Pedhazur & Schmelkin, 1992; Stevens, 
1997). Multiple regression analyses were performed at the teacher level for both reading and 
math. The purpose of the residual analyses was to compare expected teacher performance against 
observed teacher performance. The standardized residuals were calculated for each teacher in the 
regression analyses. This analysis allowed for the identification of below (-1 standard error), 
average (-.99 to .99 standard error) and above average (+1 standard error) performance. 

The first teacher level regression analysis used the dependent variable: classroom average 
3rd grade CTBS Reading NCE scores and the independent variables, classroom average 3rd 
grade SDRT scores was entered into the first block, classroom percent of Free and Reduced 
Lunch Status, Single Parent Household, Females, and African Americans were entered into the 
second block and teacher variables, years of experience and education level were entered into the 
third block. The second teacher level regression analysis used the dependent variable: classroom 
average 3rd grade CTBS Math NCE scores and independent variables, classroom average 3rd 
grade SDMT scores was entered into the first block, classroom percent of Free and Reduced 
Lunch Status, Single Parent Household, Females, and African Americans were entered into the 
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second block and teacher variables, years of experience and education level were entered into the 
third block. 

The multiple regression analyses of reading and math scores for the 1999-2000 school 
year were repeated using the 2000-2001 student and teacher information. Thus a comparison 
could be made between the two years’ results and a judgment regarding the consistency of the 
findings on effective and ineffective teachers (classrooms) could be established. 

Results 

The findings show that the multiple regressions had a high percent of explained variance 
for social science research. Previous test scores appeared as the strongest predictor of student 
achievement. 

Reading Findings (Year 1 and 2) 

The percent of explanation of the regression model for Reading - Year 1 was 73.1%. 
Previous test scores contributed 72.4%, student free/reduced lunch and teacher years of 
experience added another .01%. No other variable contributed at a significant level. The percent 
of explanation of the regression model for Reading - Year 2 was 63.1%. Previous test scores 
contributed 61.5%, student single parent household and teacher education level added another 
.02%. No other variable contributed at a significant level (see Table 2). 
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Table 2 

Multiple Regression of Student and Teacher Characteristics on Aggregated Student Reading 
Scores for Year 1 (1999-2000) and Year 2 (2000-2001) 



Variables 


B 


Year 1 
SEB 


P 


B 


Year 2 
SEB 


P 


Step 1: Student Prior Test Scores 


Pretest (SDRT/SDMT) 


.79 


.04 


.77* 


.75 


.05 


.71* 


Step 2: Student Demographics 


Race 


.03 


.02 


.05 


.03 


.02 


.05 


Gender 


.02 


.02 


.02 


-.04 


.03 


-.04 


Free/Reduced Lunch 


-.05 


.02 


-.12* 


-.01 


.02 


-.02 


Single Parent Household 


-.02 


.02 


-.04 


-.06 


.03 


-.12* 


Step 3: Teacher Characteristics 


Years Experience 


.08 


.03 


.07* 


-.07 


.04 


-.06 


Education Level 


-.75 


.62 


-.03 


3.08 


.91 


.12* 



Note. Year 1 Findings, = .73 for Step 1, A .006, for Step 2, A R^ .004, for Step 3 (ps < 
.05). Year 2 Findings, R^ = .63 for Step 1, A R^^. 01, for Step 2, A R^^ .01, for Step 3 (ps < .05). 



The analyses of residuals showed the percent of teachers (classrooms) with above 
average, on average, and below average performance in reading for year 1 and year 2. The 
percent of teachers with below average reading performance for year 1 was 14.2% (n = 63), 
average was 70.7% (n = 294), and above average was 15.1% (n = 59). The percent of teachers 
with below average reading performance for year 2 was 14.2% (n = 59), average was 70.7% (n = 
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283), and above average was 15.1% (n = 49). The percent of teachers that were included for 
both year 1 and year 2 analysis of reading was 70.6% (n =276). The percent of teachers that 
were identified as below average for two consecutive years was 2.5% (n = 7), average was 50.7 
(n = 140), and above average was 3.6% (n = 10). The percent of teachers that demonstrated 
improvement in their classification (i.e. average in year 1 to above average in year 2) was 23.2% 
(n = 64), while 20% (n = 55) of teachers showed a decline. 

Math Findings (Year 1 and!) 

The percent of explanation of the regression model for Math - Year 1 was 67.0%. 
Previous test scores contributed 65.6% and student free/reduced lunch added 1.5%. No other 
variable contributed at a significant level. The percent of explanation of the regression model for 
Math - Year 2 was 63%. Previous test scores contributed 60% and student free/reduced lunch 
and gender added 3%. No other variable contributed at a significant level (see Table 3). 




n 

c 
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Table 3 

Multiple Regression of Student and Teacher Characteristics on Aggregated Student Math Scores 
for Year 1 (1999-2000) and Year 2 (2000-2001 ) 



Variables 


B 


Year 1 
SEB 


P 


B 


Year 2 
SEB 


P 


Step 1 : Student Prior Test Scores 


Pretest (SDRT/SDMT) 


.73 


.04 


.69* 


.61 


.04 


.61* 


Step 2: Student Demographics 


Race 


-.02 


.02 


-.03 


.02 


.02 


.03 


Gender 


.04 


.02 


.05 


.06 


.03 


.06* 


Free/Reduced Lunch 


-.06 


.02 


-.15* 


-.07 


.02 


-.18* 


Single Parent Household 


-.01 


.02 


-.02 


-.04 


.03 


-.08 


Step 3: Teacher Characteristics 


Years Experience 


-.02 


.04 


-.01 


-.03 


.04 


-.02 


Education Level 


1.70 


.89 


.07 


.34 


.70 


.02 



Note. Year 1 Findings, = .66 for Step 1, A .02, for Step 2, A R^ .00 (ps < .05), for Step 3 
(ps > .05). Year 2 Findings, R^ = .63 for Step 1, A R^^ .03, for Step 2, A R^^ .000 (ps < .05), for 
Step 3 (ps > .05). 



The analyses of residuals showed the percent of teachers (classrooms) with above 
average, on average, and below average performance in mathematics for year 1 and year 2. 
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The percent of teachers with below average math performance for year 1 was 16.5% (n = 
66), average was 67.4% (n =269), and above average was 16% (n = 64). The percent of teachers 
with below average math performance for year 2 was 13.9% (n = 58), average was 72.8% (n = 
303), and above average was 13.2% (n = 55). The percent of teachers that were included for both 
year 1 and year 2 analysis of reading was 70.6% (n = 276). The percent of teachers that were 
identified as below average for two consecutive years was 2.5% (n = 7), average was 53.6% (n = 
148), and above average was 4.3% (n = 12). The percent of teachers that demonstrated 
improvement in their classification (i.e. average in year 1 to above average in year 2) was 20% (n 
= 55), while 19.6% (n = 54) of teachers showed a decline. 

Comparison of Reading and Math Findings 

The findings for year 1 and year 2 regarding teachers’ effectiveness in reading and math 
both revealed that previous test scores were the largest contributor to current test scores. The 
comparison of those teachers who were identified as below average and those teachers above 
average in reading for two consecutive years revealed some differences in teacher characteristics. 
Teachers who were identified as below average had 8.4 mean years of experience, while those 
above average teachers had 14 mean years of teaching experience. In addition, 71% of the below 
average teachers had received their Masters Degree while 90% of the above average teachers had 
attained their Masters Degree. 

In contrast, the teacher characteristics were similar between those teachers who were 
identified as below average and above average teachers in math for two consecutive years. 
Teachers who were identified as below average had 14.6 mean years of experience and teachers 
who performed above average had 12.3 mean years of teaching experience. In addition, 85.7% of 
the below average teachers and 83.3% of above average teachers had received their Masters 
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Degree. The differences in teacher characteristics for effective (above average) and ineffective 
(below average) teachers in reading but not in math coincides with the findings that teacher 
variables contributed to the amount of explained variance in reading yet not in math. 

Discussion 

The objective of this study was to identify effective and ineffective elementary school 
classrooms based on student and teacher characteristics by means of using a multiple regression 
approach followed by an analysis of residuals. The design of this study was based on previous 
research using value-added approaches to measure teacher effects on student results (Sanders, 
2000; Stronge & Tucker, 2000; Webster, 1994). In this conceptualization, an effective teacher is 
defined as a teacher that causes student improvement on core content educational outcomes such 
as reading and mathematics. The central objective of identifying effective teachers becomes one 
of establishing legitimate predictions of student performance and comparing those predictions to 
actual student outcomes (Webster, 1994). Thus, a teacher’s effectiveness is associated with 
exceptional measured performance above or below that would be expected from the students 
across the district. Procedures involve using regression analysis, hierarchical linear models, 
and/or mixed effect models to compute prediction equations by grade level for each outcome 
variable and then using these equations within classrooms to obtain gains over or below 
expectations. 

This study supported the hypothesis that previous test scores are the best predictor of 
future academic performance as well as this finding in past research (Chubb & Moe, 1990). The 
multiple R showed a consistent positive correlation between previous and current reading and 
math test scores across the two years analyzed. The other student-related socio-demographic 
variables and teacher-related predictors had a significant contribution to the amount of explained 




o 
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variance, however those variables together only added less than 1% in reading and 3% in math. 
In general, the levels of explained variance in the R-squared were high for social science 
research. A comparison was made between the actual test scores and the predicted test scores 
such that by examining the standardized residuals of the regression analysis, determinations 
could be made about classrooms that are average, under- or over-performing based on the 
selected student and teacher variables. In this particular study, the determination was made using 
one standardized absolute value residual. The utility of this kind of regression model is that, 
classrooms are able to compare their predicted unstandardized scores against their actual scores, 
while considering their particular characteristics in terms of student population and teachers. 
Implications for Theory 

From a purely theoretical perspective, Murphy (1988) analysis on the relationship 
between equity and excellence is relevant to this study and other studies approaches to 
examining school and teacher effectiveness. It is this conceptualization that integrates the 
principles of equity and excellence as an important issue for the educational reform efforts in an 
accountability era. The third-generation conceptualization of equity basically comprehends 
equity as student opportunity to learn, which goes beyond the traditional input and process focus 
of prior educational reform efforts and establishes an interesting link with the school efforts 
toward quality expressed in terms of student achievement. Significant policy changes have to be 
framed by the conceptualization of equity as excellence in the accountability educational reform 
era. In this regard, this conceptualization of equity is highly inter-related to accountability 
understood as performance. Under the conceptualization of accountability as performance, 
output educational indicators are used to track and evaluate not only school achievement, but at 
another level, teacher effectiveness (Levin, 1974; Wohlstetter, 1991). 
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Implications for Practice 

From a practical perspective, this study supports Murphy and Hallinger (1989) analysis 
that educational administrators and policy-makers have to refocus the educational reform efforts 
in general, and the educational equality issues in specific, toward what is going to be taught, to 
whom, and by whom. Implications for practice include the use of a multiple regression approach 
(Meyers, 1996; 2000) for identifying teachers with above and below average performance based 
on student results. Future research efforts include the study of best practices of high performing 
teachers identified by the findings of the residual analysis. Additional support could be provided 
to those teachers identified as low performing based on the residual analysis. It might prove 
promising to apply the methodology to other school levels such as middle and high school; in 
this context, more teacher variables can be incorporated into the analysis (e.g., teacher 
certification). Given the nature of this research, more analysis is needed to optimize the 
methodological approach used in this study. 

Limitations of the Study 

This study was strictly exploratory and it does not intend to have generalizability. The 
analysis was restricted to the public elementary schools of a very particular county in the state of 
Kentucky. This kind of analysis requires careful examination before taking any kind of 
administrative decisions. Classroom evaluations will always require policy makers to make the 
best decision based on their particular context. The same applies to the decision on the 
standardized residual that can be used for defining average, over- and under-performing 
classrooms. Further research needs to explore other variables that might compose a regression 
model, similar to the one developed by this school district. Furthermore, other methodological 
approaches can be used to analyze this kind of research problems. 
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